A virtual machine approach to fuzzing interfaces
In redpanda the core data structure
used to move around nearly all bytes that flow through the system is called
iobuf.
Because it is used in so many contexts the iobuf
interface has grown over time
and become more complex in order to accommodate a variety of use cases. Needless
to say, the correctness of this utility is critical. But how do we go about
testing it? How can we be sure that we are exercising all of its edge cases?
Enter automated fuzzing. We’ve been writing fuzz tools at Redpanda Data since the early days, and they’ve generally been based on a harness that executes a random schedule like the following snippet:
struct op {};
struct op_a : op {};
struct op_b : op {};
std::vector<op*> ops;
generate_random_schedule(ops);
for (auto* op : ops) {
op->execute();
verify();
}
Here each op
is an action that modifies the system state. They are run in
a random order, and after each action completes, the state of the system is
verified. What these operations are and what constitutes the verification step
is completely dependent on the target of the testing. For example it could be
the iobuf
interface, or it could be an entire subsystem.
Thinking back, this was a very effective technique that helped us quickly find and address rare issues that may have otherwise been found by users. We were always aware that this random generation wasn’t particularly sophisticated, and that it had a tendency to generate uninteresting schedules. But run long enough, these tools would eventually reveal an edge case or behavior we had not considered.
I had been vaguely aware that fuzzing was a large area of study in computer
science with well-known folks talking about it in the context of their
research. But it was only recently that I tried to use fuzzing tools myself,
and discovered just how rich the area is with prior work. See
https://github.com/wcventure/FuzzingPaper for a taste. It turns out the topic
of fuzzing nerd sniped me for a solid week. But I walked away from that
diversion with a long list of fuzzing related topics that I want to deep dive
into, as well as what I think is a solid approach to guided fuzzing for
iobuf
. That’s what we are going to look at in the remainder of this post.
The fuzzing target #
The target of our fuzzing is going to be a data structure called
iobuf.
Effectively an iobuf
can be thought of as a contiguous byte sequence, but
optimized to support operations that would be inefficient if the bytes were
stored physically contiguous (e.g. random insertion). Under the hood an iobuf
stores its data in variable-sized segments as a means to deal with memory
fragmentation and efficiently support various types of operations by reducing
the amount of data movement. Here is an abbreviated version of the iobuf
interface:
class iobuf {
public:
void reserve_capacity(size_t);
void append(const char *, size_t);
void prepend(const char *, size_t);
// zero-copy an extent
iobuf share(size_t, size_t) const;
const_iterator cbegin() const;
const_iterator cend() const;
...
};
The iobuf
type is used just about everywhere data is passed around: arriving
off of a network interface, being read from disk, or sending data to another core.
It also contains a lightweight mechanism for performing zero-copy sharing using
reference counting, which adds a dimension of complexity to the implementation.
Given how important iobuf
is and its non-trivial implementation, we’d like to
explore ways to increase our confidence in its correctness by subjecting it to
a variety of synthetic workloads. Additionally, we’d like to avoid a naive
exploration of the state space, and instead generate interesting schedules that
are more efficient at poking a fuzz target. Guided fuzzing using a tool like
Clang’s libfuzzer is one way to do this.
Quick intro to libfuzzer #
Guided fuzzing is useful because, in contrast to naively generating a random schedule of operations, guided fuzzers can generate a schedule after observing the behavior of the program under previous schedules. This allows guided fuzzers to target goals such as increasing code coverage, and do so more efficiently.
In this post we’ll consider Clang’s fuzzing framework called libfuzzer. The
main interface that libfuzzer provides is the following function. It serves as
an upcall and is invoked by libfuzzer providing an opaque blob of bytes that
represents a single instance of a test. Our task as users of libfuzzer is to
turn these bytes into an experiment which tests the fuzzed target. How
the input is transformed into an experiment is completely dependent on what is
being tested, be it a compression algorithm, data format, or in our case,
iobuf
.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
// do something interesting with {data, size}
}
I’m still quite new to guided fuzzers, but I have the impression that the task of deciding how to wrangle the fuzzer’s input data into an interesting experiment is a creative process that represents the majority of the work required to fuzz a target.
Covering fuzzing in general, or even libfuzzer specifically, is beyond the scope of this post. But there are a lot of great resources out there. When I was just getting started, these were (and still are) really useful:
- https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md
- https://github.com/google/oss-fuzz
- https://llvm.org/docs/LibFuzzer.html
- https://aflplus.plus
The virtual machine approach #
When I was first learning about using guided fuzzers I had no clue at all how I
should turn the opaque data that the fuzzer provided into something useful. In
retrospect I was probably close to abandoning the idea and filing it away as
not being useful for cases like iobuf
where the goal is testing an API
surface area.
Luckily a colleague of mine, https://github.com/felixguendling, had been down this path before and offered the advice of using an approach that modeled the problem as a virtual machine. In this approach the input data from libfuzzer is interpreted as a program expressed as bytecode.
Effectively we let libfuzzer designs program that we execute!
Operations #
Okay, so a virtual machine interpreter. Let’s start with the op codes. I
created one op code for each iobuf
operation of interest. Here is an
abbreviated list:
enum class op_type : uint8_t {
moves,
trim_front,
hexdump,
reserve_memory,
prepend_temporary_buffer,
append_iobuf,
append_char_array,
max = append_char_array,
};
Some operations like hexdump
do not require any operands, while others, like
append_char_array
depend on additional operands like a source of bytes. To
keep things simple the following op_spec
type is used to represent an instance
of any operation and its operands. We wrap all data in a std::string_view
so
that we have access to a convenient API for manipulating byte strings, and to
reduce memory copies for performance improvements.
struct op_spec {
op_type op;
size_t size;
std::string_view data;
explicit op_spec(op_type op)
: op(op) {}
};
Now we need a way to turn libfuzzer’s input data into a sequence of operations.
Let’s walk through the key pieces from the bottom up. Consider the code snippet
below. First, the input data from libfuzzer is used to construct an instance of
driver
where it is stored as the string view program_
. And really going all in
with the virtual machine analogy, the program counter pc_
points to the
beginning of the input data.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
std::string_view program(reinterpret_cast<const char*>(data), size);
driver vm(program);
...
}
class driver {
const std::string_view program_;
std::string_view::iterator pc_;
public:
explicit driver(std::string_view program)
: program_(program)
, pc_(program.cbegin()) {}
...
};
At a high-level the driver
reads operations from the input program until the
end of the program is reached. There are two methods used to drive this
decoding process. First, read<T>
, shown in the snippet below, is used to
decode integers from the input program. It copies the bytes pointed to by the
program counter into an instance of T
, advances the program counter, and
returns the value. When the end of the program is reached, an end_of_program
exception is thrown. This exception, as we’ll see later, is a signal to cleanly
stop execution because in general the input program will not be perfectly
aligned. Without this trick, nearly all inputs would represent invalid
programs.
template<typename T>
T read() {
if (std::distance(pc_, program_.cend()) < sizeof(T)) {
throw end_of_program();
}
T ret;
std::memcpy(&ret, pc_, sizeof(T));
std::advance(pc_, sizeof(T));
return ret;
}
The second method used to drive execution is next()
. First, read<T>
is used
to decode the next integer from the program and interpret it as an op code. The
integer is forced into the range of valid op codes, otherwise the fuzzing would
be very inefficient since a huge proportion of the possible integer values are
invalid op codes in this particular interpreter.
op_spec next() {
// next byte is the op code
using underlying = std::underlying_type_t<op_type>;
const auto max_op = static_cast<underlying>(op_type::max);
// the next op
auto op = op_spec{static_cast<op_type>(
read<underlying>() % (max_op + 1)))};
And finally, the operands. Those are also going to be pulled out of the input program. I’ve abbreviated the cases for illustration, which are grouped into two categories: operations that need only a size operand, and those that need a size and data. Any data for operands is taken directly from the input to avoid large memory allocations–the program size itself bounds memory consumption.
// needs size operand
switch (op.op) {
case op_type::trim_front:
case op_type::append_char_array: {
op.size = read<uint32_t>();
break;
}
default:
break;
}
// needs data operand
switch (op.op) {
case op_type::append_fragments:
case op_type::append_char_array:
op.size = std::min<size_t>(
op.size, std::distance(pc_, program_.end()));
op.data = {pc_, op.size};
std::advance(pc_, op.data.size());
break;
default:
break;
}
return op;
}
...
};
To recap, we now have an input program, and a method for converting it into a
sequence of operations. Now we need to apply the operations. Here is an updated
version of the libfuzzer upcall method that we saw earlier. As shown below, the
driver::operator()
is invoked until a stopping condition is reached–either a
clean shutdown or an exception.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
// NOLINTNEXTLINE
std::string_view d(reinterpret_cast<const char*>(data), size);
driver vm(d);
try {
while (vm()) {
vm.check();
}
} catch (...) {
vm.print_trace();
throw;
}
return 0;
}
The driver::operator()
method is quite simple: it converts the
end_of_program
exception into a false
return value that signals a clean
shutdown, or it takes the successfully decoded operation and passes it to
handle_op
and then signals to the caller that there is more input to
process.
bool operator()() {
const auto op = [this]() -> std::optional<op_spec> {
try {
return next();
} catch (const end_of_program&) {
return std::nullopt;
}
}();
if (op.has_value()) {
handle_op(op.value());
return true;
}
return false;
}
Next we’ll take a look at handle_op
and see how an operation is run and what
it means for an operation to succeed or fail.
Testing fuzz target properties #
So far we have seen some basic infrastructure for setting up libfuzzer to generate schedules of operations to test an interface. But what exactly are we going to test? This is a core challenge when fuzzing a target.
In our case with iobuf
we’d like to do two things. First, we want to test
that operations on iobuf
behave as expected. That is, if some data is
appended for example, no matter what that data is or what state the iobuf is
in, the iobuf
instance should reflect that the data was correctly appended.
Second, we want the fuzzer to achieve full code coverage of the iobuf implementation, and find interesting executions and edge cases. We want this coverage because it is useful to know that we are at the very least testing code that we believed was useful for some situation. But another reason for this is that we are going to turn on all of the sanitizers which are exceptionally good at finding things like undefined behavior. But we need coverage for the sanitizers to be effective.
Let’s return to driver::handle_ops
. All this method does is forward the
decoded operations to an instance of the type iobuf_ops
, where all the
important bits live.
class driver {
iobuf_ops m_;
void handle_op(op_spec op) {
switch (op.op) {
case op_type::copy:
m_.copy();
return;
case op_type::moves:
m_.moves();
return;
...
}
};
The iobuf_ops
type is conceptually very simple. It contains two members: an
iobuf
instance, and a reference implementation of the iobuf
interface. The
sequence of operations decoded from the libfuzzer input by the driver
we
designed above will target both containers. After an operation is applied the
equivalence of the two containers will be checked. If they are different, then
there is a mistake somewhere!
class iobuf_ops {
iobuf buf;
std::deque<std::string_view> ref;
The reference implementation is intentionally designed to be as simple as
possible in order make visual inspection of its behavior clear as well as
avoiding any correlated behavior between it and the iobuf
implementation.
Below are two example operations: appending data from an input, and trimming
data from the front of the containers. When appending data the iobuf
may
coalesce data into existing fragments, or make allocation decisions based on
heuristics. However, the reference implementation merely appends the input data
to the std::deque
container.
void append_char_array(std::string_view payload) {
buf.append(payload.data(), payload.size());
ref.push_back(payload);
}
void trim_front(size_t size) {
buf.trim_front(size);
while (size && !ref.empty()) {
if (size >= ref.front().size()) {
size -= ref.front().size();
ref.pop_front();
} else {
ref.front().remove_prefix(size);
return;
}
}
}
After each operation is completed the check
method is called. As we can see
below it performs some sanity checks on various properties. At the end,
check_equals()
is called to verify that the two containers are byte-for-byte
equivalent. I’ll omit that here because it isn’t related to fuzzing, and is a
little complicated.
void check() const {
const auto ref_size = std::accumulate(
ref.begin(),
ref.end(),
size_t(0),
[](size_t acc, std::string_view sv) { return acc + sv.size(); });
if (buf.empty() != (ref_size == 0)) {
throw std::runtime_error(fmt::format(
"Iobuf empty state {} != reference empty state {}",
buf.empty(),
ref_size == 0));
}
if (buf.size_bytes() != ref_size) {
throw std::runtime_error(fmt::format(
"Iobuf size {} != reference size {}",
buf.size_bytes(),
ref_size));
}
check_equals();
}
So how well did we do?
Results #
We did exceptionally well! We managed to get something like 99.9% code coverage, and even found some dead code. There are a few spots that coverage didn’t reach and we aren’t yet sure if they are just hard to reach, or if it’s also dead code. That is an on going investigation.
In addition to the coverage, we found 3 real bugs. One example is that schedules containing memory reservations were more likely to produce incorrect behavior. This was because some functions did not take into account that an internal fragment may exist to provide capacity, but be logically empty. We don’t have any reason to believe these conditions were ever encountered in practice. But they could have been: making a memory reservation is a common optimization in our code base.
The one major area of the iobuf
implementation that was not fuzzed is the
futurized interfaces that must be run in the context of a
seastar reactor thread. I’ll be posting
a new article on how we handle that scenario.